Real-Time Detection and Redirection from Counterfeit Websites

ABSTRACT

Counterfeit uniform resource locators (URLs) are detected and blocked in real-time by a browser extension in communication with a counterfeit URL detection system. The browser extension receives a URL requested within a browser application. Content from a webpage associated with the received URL is extracted and transmitted to the counterfeit URL detection system, which is configured to analyze the content and return an assessment indicating whether the URL is counterfeit. If the assessment indicates that the URL is counterfeit, the browser extension blocks the browser application from accessing content associated with the URL and redirects the browser extension to a legitimate URL.

CROSS-REFERENCE TO RELATED APPLICATION

This disclosure is a continuation application of U.S. patent applicationSer. No. 16/694,786, filed Nov. 25, 2019, listing as first inventorShashi Prakash, titled “Real-Time Detection and Redirection fromCounterfeit Websites,” which in turn is a continuation-in-partapplication of U.S. patent application Ser. No. 16/260,994, filed Jan.29, 2019, listing as first inventor Shashi Prakash, titled “Real-TimeDetection and Blocking of Counterfeit Websites,” which in turn claimsthe benefit of U.S. Provisional Patent Application No. 62/628,894, filedFeb. 9, 2018, listing as first inventor Shashi Prakash, titled “Systemto Detect and Block Counterfeit Web sites in Real-Time,” each of whichis hereby incorporated herein by reference.

TECHNICAL FIELD

This disclosure relates to detecting and blocking access to counterfeitwebsites in real time.

BACKGROUND

Counterfeit web sites are used for a variety of nefarious purposes.These web sites are created with intent to make users believe they areusing a legitimate site of a known entity, deceiving the users intoproviding sensitive personal or financial information or downloadingpotentially dangerous files. In some cases, links to counterfeitwebsites may be sent to the user in a message, such as an email, SMSmessage, or instant message. In other circumstances, a nefarious websitemay have an address similar to that of a popular, trusted website, suchthat a user is directed to the nefarious website if a user mistypes theaddress of the popular website into a browser. Because the harm thatthese counterfeit websites or their operators can cause to a user may besevere, it is desirable to block access to these websites.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating an environment in whichcounterfeit website detection is performed, according to one embodiment.

FIG. 2 is a block diagram illustrating functional modules within abrowser extension, according to one embodiment.

FIG. 3 is a block diagram illustrating functional modules within acounterfeit URL detection system, according to one embodiment.

FIG. 4 is a flowchart illustrating a process for blocking user access tocounterfeit websites in real-time, according to one embodiment.

FIG. 5 is a flowchart illustrating a process for analyzing whether URLsare counterfeit, according to one embodiment.

FIG. 6 is a flowchart illustrating a process for analyzing whether URLsare counterfeit and redirecting a browser application to a legitimateURL, according to one embodiment.

FIG. 7 illustrates a block diagram of analytics that may be generated,according to one embodiment.

FIG. 8 is a block diagram illustrating an example of a processing systemin which at least some operations described herein can be implemented.

DETAILED DESCRIPTION System Overview

Counterfeit uniform resource locators (URLs) are detected and blocked inreal-time by a browser extension in communication with a counterfeit URLdetection system. The browser extension, configured for example as anextension within a web browser, email client, or mobile application,protects users against nefarious websites by intercepting a request toaccess a counterfeit URL and blocking the web browser, email client, ormobile application from accessing the nefarious content. In someembodiments, the browser extension receives a URL requested within abrowser application. Content from a webpage associated with the receivedURL is extracted and transmitted to the counterfeit URL detectionsystem, which is configured to analyze the content and return anassessment indicating whether the URL is counterfeit. If the assessmentindicates that the URL is counterfeit, the browser extension blocks thebrowser application from accessing content associated with the URL.

As used herein, a “counterfeit URL” refers to an address that referencesan untrusted webpage. These webpages may exhibit nefarious behaviors,such as phishing for sensitive information from a user or causingmalicious content to be downloaded to a user's device, or may emulateother websites in order to deceive users into believing that the webpageis affiliated with a trusted source. Some counterfeit URLs may mimic theURL of a well-known website so that the user believes she is accessingthe well-known website. For example, if a user is familiar withwww.example.com, the user may believe she is accessing the familiarwebpage when in reality she is requesting the counterfeit URLwww.examp1e.com. Other counterfeit URLs may redirect the browser tonefarious webpages, such that a user's careful inspection of therequested URL may not reveal information about the webpage ultimatelydisplayed by the browser.

FIG. 1 is a block diagram illustrating an environment in whichcounterfeit website detection is performed, according to one embodiment.As shown in FIG. 1, the environment can include a user device 110, oneor more third-party servers 120, and a counterfeit URL detection system130 communicating over a network 140. The network 140 enablescommunication between the user device 110, third party servers 120, andcounterfeit URL detection system 130, and may include one or more localarea networks (LANs), wide-area networks (WANs), metropolitan areanetworks (MANs), and/or the Internet.

The user device 110 is a computing device used by a user to accesscontent over the network 140 and can be any device capable of displayingelectronic content and communicating over the network 140, such as adesktop computer, laptop or notebook computer, mobile phone, tablet,eReader, television set, or set top box. In some cases, the user device110 can be configured as part of an enterprise, representing a pluralityof user devices 110 associated with an organization such as a company.

The user device 110 executes a browser application 112, comprisingsoftware that when executed by the user device 110 retrieves anddisplays electronic documents. Other applications can additionally beexecuted by the user device 110, such as an email application, a shortmessaging service (SMS) application, or other applications capable ofreceiving and sending electronic messages.

As used herein, the browser application 112 can refer to any applicationcapable of retrieving electronic content over the network 140, includingweb browsers, mobile applications, or email applications. The browserapplication 112 includes a user interface enabling users to interactwith electronic content by, for example, displaying the content to theuser, providing a navigation or address bar for users to input URLs torequest desired content, and rendering selectable hyperlinks embeddedwithin content that can be selected to cause the browser application 112to retrieve additional content. The browser application 112 may alsoinclude a networking engine that retrieves content associated with a URLwhen the URL is requested by explicit user action or by a call from anexternal application. For example, a user may explicitly request thebrowser application 112 access a URL by typing or pasting a copied URLinto an address bar in the browser user interface. As another example,if a user selects a hyperlink in an email that contains a URL, the emailapplication may generate a call to the browser application 112 to causethe browser 112 to access a webpage identified by the URL.

A browser extension 116 operates within or parallel to the browserapplication 112 and extends functionality of the browser application112. The browser extension 116, which for example can comprise computerprogram instructions provided by the counterfeit URL detection system130 and executable by a processor of the user device 110, can receive aURL requested by the browser application 112. Before the browserapplication 112 retrieves and displays content associated with thewebpage identified by the URL, the browser extension 116 determineswhether the URL is counterfeit. If the URL is determined to becounterfeit, the extension 116 blocks the browser application 112 fromdisplaying the webpage content. If the page is determined to not becounterfeit, the extension 116 allows the browser application 112 todisplay the content (for example, by taking no action to block thecontent). The browser extension 116 is described further with respect toFIG. 2.

The third-party servers 120 store electronic content and serve thecontent to the user device 110 when requested. The third-party servers120 can be computing devices associated with any of a variety of sourcesof content that may be requested by a user, such as banks, onlineretailers, or government entities. Some of the third-party servers 120may be associated with a malicious actor and serve counterfeit websitesthat are designed to look like or deceive users into believing they areassociated with a trusted content source.

The counterfeit URL detection system 130 analyzes URLs and webpagecontent to determine whether a webpage provided by a third-party server120 is authentic or counterfeit. In some cases, the detection system 130is configured as part of an enterprise shared with a plurality of userdevices 110, for example communicating with the user devices 110 over alocal area network or behind a firewall shared with the user devices110. In other cases, the detection system 130 is remote and operatedindependently from the user device 110, for example on one or morecloud-based servers. The detection system 130 can instead be operated bythe user device 110, as an application external to the browser 112. Thedetection system 130 may also provide the browser extension 116 fordownload by the user device 110.

In general, the counterfeit URL detection system 130 applies a trainedmodel to content extracted from or associated with a webpage. Whenapplied to a set of data associated with a URL, the model outputs ascore indicating a likelihood that the URL is counterfeit. The detectionsystem 130 uses the score to generate an assessment indicating eitherthat the URL is counterfeit or not counterfeit, and returns theassessment to the browser extension 116. The counterfeit URL detectionsystem 130 is described further with respect to FIG. 3.

FIG. 2 is a block diagram illustrating functional modules within thebrowser extension 116, according to one embodiment. As shown in FIG. 2,the browser extension 116 can include a browser interface 205, a URLanalyzer 210, a URL store 215, a behavior monitor 220, and a behaviorstore 225. Each of the modules can comprise computer programinstructions executable by a processor, such as a processor of the userdevice 110. The browser extension 116 can include additional, fewer, ordifferent modules, and functionality can be distributed differentlybetween the modules.

The browser interface 205 communicates with the browser 112 to receiveURLs requested in the browser 112 and block the browser 112 fromaccessing URLs that are determined to be counterfeit.

The URL analyzer 210 determines whether URLs requested by the browser112 are counterfeit or authentic. To determine whether a URL iscounterfeit, the URL analyzer 210 can access a URL store 215 that storesa list of URLs known to be either trusted or counterfeit. The URL store215 can comprise a database or listing of URLs, each mapped to anassessment of whether the URL is trusted or counterfeit. The URL store215 can be stored locally on the user device 110 or on another deviceaccessible to the URL analyzer 210. If the received URL is listed in theURL store 215, the URL analyzer 210 can determine whether the receivedURL is trusted based on the assessment of the URL in the store 215.

In some cases, a URL that is similar but not identical to a requestedURL is stored in the URL store 215, and the URL analyzer 210 matches therequested URL to a similar stored URL based on a heuristic. In oneembodiment, the URL analyzer 210 matches the requested URL to the URL inthe store 215 if at least a portion of the requested and stored URLsmatch. A matched portion of the URLs may include at least a domain name.For example, if the requested URL is www.example.com/sub-level, and theURL store 215 identifies the domain www.example.com as a counterfeitURL, the URL analyzer may determine that the requested URL is alsocounterfeit because it includes at least the counterfeit domain name. Inanother embodiment, the heuristic applied by the URL analyzer 210accounts for patterns in counterfeit and authentic URLs listed in theURL store 215. For example, if www.example.com is assessed in the URLstore 215 as being authentic but subdomain1.example.com andsubdomain2.example.com are assessed as counterfeit, the URL analyzer 210may determine that subdomain3.example.com is also likely to becounterfeit because it is more similar to the URLs known to becounterfeit than to the authentic URL.

The URL analyzer 210 can also extract information associated with areceived URL to analyze whether the URL is counterfeit. In someembodiments, the URL analyzer 210 extracts the information associatedwith the URL if the URL is not listed in the URL store 215. In otherembodiments, the URL analyzer 210 may extract the information for someor all webpages requested by the browser 112, even if an assessment ofthe URL is listed in the URL store 215. The extracted information caninclude content of a webpage referenced by the URL. For example, the URLanalyzer 210 can retrieve text from the webpage or any images on thepage. The URL analyzer 210 may additionally or alternatively extractinformation from HTTP requests transmitted by the browser 112 and HTTPresponses received by the browser. For example, a header and a body canbe extracted from both the HTTP request and response. Any informationextracted by the URL analyzer 210 is sent to the counterfeit URLdetection system 130 for analysis. When an analysis is returned by thedetection system 130, the URL analyzer 210 can add the URL andassessment to the URL store 215 and either block or allow access to thewebpage based on the assessment.

The behavior monitor 220 captures user behaviors related to the browser112 and counterfeit websites, and stores the user behaviors in thebehavior store 225. The user behaviors can include a number of uniqueURLs requested by the user of the user device 110 in a specified periodof time. In some cases, the behavior monitor 220 can record any URLrequested by the browser 112, whether directly entered into the browser112 by the user or triggered by a user selection of a hyperlink in awebpage or external application such as an email or SMS messagingapplication. In other cases, the behavior monitor 220 may record only anumber of URLs that were requested in response to specified actions. Forexample, the behavior monitor 220 can record a number of URLs requestedin response to a user selection of a hyperlink in an externalapplication, but does not record a number of URLs requested in responseto a user directly entering the URL into the browser 112.

The user behaviors recorded by the behavior monitor 220 can also includea number of counterfeit webpages blocked, which can be quantified, forexample, as a number of webpages blocked in a specified period of time(e.g., three counterfeit URLs blocked in eight hours) or as a raterelative to the number of unique URLs requested (e.g., one counterfeitURL blocked per 100 requested URLs). For each blocked webpage, thebrowser extension 116 can record the URL of the page and informationabout the source of the URL. For example, a URL source can indicatewhether the user received the URL in an email, in an SMS message, orthrough another webpage. If received in a message, the behavior monitor220 can also record information about the sender of the message, such asan email address or phone number of the sender. If received throughanother webpage, the behavior monitor 220 can record a URL or otheridentifier of the webpage.

Additional user behaviors recorded by the behavior monitor 220 caninclude user details associated with the user of the user device 110.These details can include, for example, an identifier of the user (suchas a username) or of the user device 110 (such as an IP address or MACaddress), or a user-agent string.

FIG. 3 is a block diagram illustrating functional modules within thecounterfeit URL detection system 130, according to one embodiment. Asshown in FIG. 3, the detection system 130 can include a model 305, acounterfeit assessment module 310, and a user analytics module 315. Eachof the modules can comprise computer program instructions executable bya processor. Furthermore, the counterfeit URL detection system 130 caninclude additional, fewer, or different modules, and functionality canbe distributed differently between the modules. For example, the useranalytics module 315 may be executed by the user device 110 or a deviceaffiliated with an enterprise including the device 110, rather than thecounterfeit URL detection system 130.

The model 305 is a trained object representing mathematicalrelationships between features related to a URL and a likelihood thatthe URL is counterfeit. The model 305 can be trained using components ofwebpages that are known to be counterfeit or not counterfeit. Thesewebpage components, including, for example, one or more of textextracted from a webpage, an image extracted from the webpage, HTTPrequest and response headers or bodies, or the URL itself, may begrouped into a set of data representing each URL and labeled with anassessment of the webpage's authenticity. Any of a variety of machinelearning or statistical techniques can be applied to the labeled webpagecomponents to generate the model 305. In some cases, differentalgorithms can be applied to different types of webpage components. Forexample, images extracted from the webpage can be analyzed by imageobject detection and image recognition algorithms. Text can be analyzedby a natural language processing algorithm. Threat intelligence, eitherlearned or received from an external provider, can supplement thesetechniques.

The model 305 may be updated periodically, such as once per month oronce per year, using new sets of webpage components. For example, themodel is updated periodically in order to respond to new techniques usedby nefarious actors.

The counterfeit assessment module 310 applies the model 305 to a datasetassociated with a URL to determine whether a URL is counterfeit. Thedataset, which can be transmitted to the counterfeit assessment module310 by the browser extension 116, may include components of a webpagereferenced by the URL, HTTP requests and responses associated with anattempt by the browser to display the webpage, and/or the URL itself.The counterfeit assessment module 310 applies the model 305 to thedataset and receives a score output by the model 305. Based on thescore, the counterfeit assessment module 310 determines whether the URLis counterfeit.

In one embodiment, the counterfeit assessment module 310 determineswhether the URL is counterfeit by comparing the score to a threshold. Ifthe score is greater than the threshold, the counterfeit assessmentmodule 310 outputs an assessment that the URL is counterfeit. If thescore is less than the threshold, the module 310 outputs an assessmentthat the URL is not counterfeit.

In another embodiment, the counterfeit assessment module 310 analyzesthe score based on a threat tolerance specified by the user of the userdevice 110, an administrator of an enterprise associated with the userdevice 110, or another user. If an enterprise has a low threat tolerance(because, for example, the enterprise deals in highly sensitive data),the counterfeit assessment module 310 sets a high threshold score. Alower threshold score can be set for an enterprise that has a highthreat tolerance (e.g., because overly cautious URL analysis andblocking would interrupt the workflow of the enterprise). For example,if the model 305 outputs scores from 0 to 1, where a score of 1indicates certainty that a URL is counterfeit, the counterfeitassessment module 310 may set a threshold of 0.75 when an enterprise oruser has a low threat tolerance and a threshold of 0.5 when anenterprise or user has a high threat tolerance.

The user analytics module 315 receives data describing behaviors ofusers that are associated with URLs and webpages, for example ascaptured by the behavior monitor 220, and generates analytics thatquantify the user behaviors for one or more users. As described above,the user behaviors can include, for example, a number of unique URLsrequested by users, a number of counterfeit webpages blocked by thebrowser extension 116, and sources of the counterfeit URLs. The useranalytics module 315 analyzes the behaviors for one or more users over aperiod of time and outputs a representation of the analyzed behaviorsfor review by a user, such as the user of the device 110 or anadministrator of an enterprise.

In one embodiment, the representation output by the user analyticsmodule 315 includes a list of any users in an enterprise that attemptedto access more than a specified number of counterfeit URLs in aspecified period of time. For example, the user analytics module 315identifies, based on the received user behavior data, any user in anenterprise who attempted to use at least five counterfeit URLs in aparticular month. As another example, the user analytics module 315identifies any user in the enterprise for whom counterfeit URLsconstituted at least 1% of the total number of URLs accessed by the userin a specified quarter. The users identified by the analytics module 315can be output to an administrator of the enterprise to, for example,build a list of users to whom to target training efforts.

In another embodiment, the representation output by the user analyticsmodule 315 identifies common sources of counterfeit URLs. The sourcesidentified by the analytics module 315 may be a general category ofsources through which one or more users have received a greatest numberof counterfeit URLs. For example, the analytics module 315 may determinethat 63% of all counterfeit URLs accessed by users in an enterpriseduring a specified year were contained in an email, while lowerpercentages of the counterfeit URLs were accessed through SMS messages,webpages, or other sources. Alternatively, the sources identified by theanalytics module 315 may include particular originating sources who haveprovided the highest number of counterfeit URLs accessed by one or moreusers, or who have provided greater than a threshold number of thecounterfeit URLs accessed by the users. These particularized sources mayidentify, for example, a domain name or IP address that transmits emailscontaining counterfeit URLs, a telephone number that transmits SMSmessages containing counterfeit URLs, or a name or other identifier of auser who has sent messages containing counterfeit URLs. For example, theanalytics module 315 may determine that, of the counterfeit URLsaccessed by a particular user, a greatest number of them were providedthrough emails sent from the domain @example.com.

Once a common source of counterfeit URLs has been identified, the useranalytics module 315 may generate recommendations for reducing userattempts to access counterfeit URLs. In some cases, the analytics module315 combines the source analytics with analytics identifying the usersin an enterprise who were most likely to access a counterfeit URL,providing the enterprise with recommendations for targeted training. Forexample, if the users in an enterprise who accessed the most counterfeitURLs in a month received most of those counterfeit URLs through SMSmessages, the analytics module 315 may recommend that the enterprisetrain users to identify trusted or untrusted SMS messages. In othercases, the analytics module 315 may recommend particular updates to asecurity policy, a firewall, or an email spam filter to block messagesoriginating from a source that has provided a significant quantity ofcounterfeit URLs.

Real-Time Blocking of Counterfeit Websites

FIG. 4 is a flowchart illustrating a process 400 for blocking useraccess to counterfeit websites in real-time, according to oneembodiment. The process 400 can be performed by the user device 110, forexample by executing the browser extension 116. The steps of the process400 can include additional, fewer, or different steps, and the steps canbe performed in different orders.

As shown in FIG. 4, the browser extension 116 receives 402 a URL fromthe browser 112. The browser extension 116 can capture the URL from thebrowser application 112 when the URL is requested in the browser. Insome cases, an external application calls the browser 112 to access aURL when a user selects a hyperlink containing the URL in the externalapplication. For example, if the user selects a link in an email, theemail application generates a call to the browser application 112 thatcontains the URL and causes the browser 112 to access a webpageassociated with the URL.

The browser extension 116 determines 404 whether the received URL has amatch in a URL store 215. The URL store 215 stores assessments ofauthenticity of each of a plurality of known URLs. The browser extension116 may determine 404 if the received URL matches any known URL in thestore 215 by searching either for a direct match to the received URL, orby comparing the received URL to the known URLs using heuristics.

If the received URL is matched to a known URL in the store 215, thebrowser extension 116 determines 406 if the received URL is counterfeitbased on the assessment stored for the matched URL. For example, if theURL store 215 indicates that the matched URL is counterfeit, the browserextension 116 determines that the received URL is also counterfeit.

If the received URL is determined 406 to be counterfeit, the browserapplication 116 blocks 408 access to webpage content referenced by thereceived URL. For example, the browser application 116 transmits aninstruction to the browser application 112 to not request the webpagecontent, to not display the webpage content, or to stop displaying thewebpage content. In some cases, the browser application 116 redirectsthe browser 112 away from the webpage associated with the URL, causingthe browser to, for example, display a page indicating that the webpagehas been blocked. The browser application 112 can also capture andrecord any user behaviors related to the attempt to access the URL.

If the received URL is determined 406 to not be counterfeit, the browserapplication 116 allows 410 access to content associated with the URL.For example, the browser application 116 takes no action to interruptthe process in the browser 112 to request and display the webpagecontent referenced by the URL. User behaviors associated with the URLcan also be captured and stored in the behavior store 225.

Returning to step 404, if the received URL does not match any known URLsin the URL store 215, the browser application 116 extracts 412 contentfrom a webpage referenced by the received URL. The extracted content issent 414 to the counterfeit URL detection system 130 for analysis, andthe browser extension 116 receives 416 an assessment of the URL from thedetection system 130. The assessment indicates whether the received URLis counterfeit. If the assessment indicates that the URL is counterfeit418, the browser application 116 blocks 408 access to the webpage andrecords user behavior. If the assessment indicates that the URL is notcounterfeit, the browser application 116 allows 410 the request andrecords the user behavior.

FIG. 5 is a flowchart illustrating a process 500 for analyzing whetherURLs are counterfeit, according to one embodiment. The process 500 canbe performed by the counterfeit URL detection system 130. The steps ofthe process 500 can include additional, fewer, or different steps, andthe steps can be performed in different orders.

As shown in FIG. 5, the detection system 130 receives 502 webpagecontent from a browser extension 116 executed by a user device 110. Thereceived content can include content extracted from a webpage referencedby a URL requested by a user of the user device. User behaviorscollected by the browser extension 116 can also be transmitted to thedetection system 130, either in conjunction with the webpage content orasynchronously.

The detection system 130 applies 504 a trained model to the receivedcontent. The model is configured to output an assessment indicatingwhether a URL is counterfeit based on analysis of webpage contentassociated with the URL. When the model is applied to the receivedwebpage content, the detection system 130 receives an indication thatthe URL requested on the user device 110 is counterfeit is or is notcounterfeit.

The detection system 130 returns 506 the assessment to the browserextension 116, which is configured to block access to the webpage if theassessment indicates that the URL is counterfeit.

The detection system 130 also generates 508 analytics that quantify userbehaviors related to URLs. The analytics can include, for example, anidentification of users who accessed at least a threshold number ofcounterfeit URLs in a specified period of time, or an identification ofa source that provided at least a threshold number of counterfeit URLs.The analytics can be output for display to an administrator of thedetection system 130 or provided as feedback to a user or enterprise,for example to customize training programs or to modify enterprisesecurity policies.

Redirecting a Counterfeit URL

FIG. 6 is a flowchart illustrating a process 600 for analyzing whetherURLs are counterfeit and redirecting a browser application to alegitimate URL, according to one embodiment. As noted above, a receivedURL may be inspected to determine whether the received URL iscounterfeit. If the received URL is determined to be counterfeit, abrowser extension executing on a user device may be redirected to alegitimate URL associated with a legitimate entity. A legitimate entitymay include an entity (e.g., a company, institute, university) that islegitimately operating a webpage and providing a service to users (e.g.,retailing a product, providing information). In some embodiments, thelegitimate entity may include entities that are explicitly authorizedand subscribed as being legitimate entities and may receive redirectedURL requests from a browser application executing on a user device. Insome embodiments, the browser application can execute on a mobile device(e.g., a smartphone) that may communicate using one or morecommunication channels (e.g., the internet, wi-fi).

Redirecting a received URL to a legitimate URL may include determiningwhether the content of the received URL counterfeits a known legitimateURL. In some embodiments, redirecting a received URL to a legitimate URLmay include comparing content extracted from a webpage associated withthe received URL to content extracted from a webpage associated with thelegitimate URL. In other words, the characteristics of the contentextracted from the received URL (e.g., text, objects detected in images,domain name, a hypertext transfer protocol (HTTP) request header orbody, an HTTP response header or body) may be compared with thecharacteristics of the content extracted from the legitimate URL todetermine whether the characteristics have a similarity that exceeds athreshold similarity. A threshold similarity may include an identifiednumber of similar characteristics between the received URL and thelegitimate URL that indicates that the received URL has likely attemptedto counterfeit or mimic the legitimate URL.

As an example, the characteristics of the website associated with thereceived URL may include a plurality of shoes offered for sale in aspecific format on the webpage and a logo with distinctive features. Thecharacteristics of a website associated with a known legitimate URL of aleading shoe retailer may include similar layout of the plurality ofshoes offered for sale in the specific format shown in the webpage ofthe received URL and a similar logo with many of the same distinctivefeatures as in the webpage of the received URL. In this example, thereceived URL may be attempting to take advantage of the fame and webbrowser traffic to the known legitimate URL. Based on identifying thesimilarity between the received URL and legitimate URL, the browser of auser device may be redirected to the legitimate URL.

The browser application executing on a user device may receive a URLrequest (block 602). The URL request may include a request for aspecific URL.

The browser application may inspect the URL request to determine whetherthe requested URL matches any URL listed in a URL store (block 604).When a user clicks on a Uniform Resource Locator (URL) link, the URL maybe matched against a datastore of URLs using heuristics. The receivedURL can be compared against a listing of known counterfeit URLs todetermine whether the URL matches a known counterfeit URL. In someembodiments, the URL store may include a listing, database, registry,etc., of known or trusted URLs associated with legitimate entities. Ifthe URL matches a URL in the URL store, the received URL may be acounterfeit URL and browser application may determine whether the URLcounterfeits a known legitimate URL (block 606).

If the URL fails to match a URL in the URL store, the browserapplication may extract webpage content from the URL (block 608).Extracting webpage content may include identifying images, text, HTTPheaders, etc., associated with the received URL. A set ofcharacteristics (e.g., detected objects, detected text, features ofimages, format of the webpage, domain name) may be identified from theextracted text of the received URL.

The browser application may send the extracted content to a counterfeitURL detection system (CDS) (block 610). The CDS may include an enginethat utilizes a Graphical Processing Unit (GPU) in combination withvarious techniques (image object detection, natural language processing,threat intelligence, etc.) to determine whether the website iscounterfeit.

The browser application may receive an assessment from the CDS (block612). The assessment may include a verdict on whether the website iscounterfeit. The counterfeit URL detection system may determine whetherthe URL is counterfeit based on the received assessment (block 614). Ifthe URL is not counterfeit (i.e. the webpage is legitimate), the browserapplication may allow the request, load the requested URL, and recorduser behavior (block 616).

If the assessment indicates that the URL is counterfeit, the browserapplication may determine whether the URL counterfeits a knownlegitimate URL (block 606). This may include extracting content (e.g.,text, images, video, logos, domain names) from the counterfeit URL toidentify characteristics of the extracted content. Various techniques,such as object detection, image recognition, natural languageprocessing, etc., may be utilized to identify characteristics includedin the extracted content from the counterfeit URL. In some embodiments,the browser application may utilize a central artificial intelligenceengine that is implemented on any of a user device or a cloud-baseddevice or series of interconnected devices.

The characteristics identified from the extracted content of thecounterfeit website may be compared to a plurality of legitimatewebpages. In some embodiments, content of various legitimate webpagesmay be extracted to determine characteristics of each legitimatewebpage. For example, a web indexing process (e.g., a web crawler) mayretrieve content from various legitimate URLs over a network (e.g., theinternet). Extracting content from any of the counterfeit URL and alegitimate URL may include fetching images, text, HTTP requests from theURL and analyzing the extracted content at any of a user device or acloud-based device. The extracted content and identified characteristicsassociated with each of the plurality of legitimate websites may bemaintained in any of a browser extension and the counterfeit URLdetection system.

In some embodiments, a plurality of known legitimate websites may belisted on a primary listing of legitimate webpage. The primary listingof legitimate websites may include websites/URLs associated withlegitimate entities that are authorized as legitimate and havesubscribed to have received counterfeit URLs redirected to thelegitimate URL. In some embodiments, the primary listing of knownlegitimate websites may include known legitimate URLs. The primarylisting may include a listing, registry, database, etc. that includes aplurality of legitimate URLs associated with a legitimate entity andcharacteristics of the content provided on each webpage. In someembodiments, a portion of the plurality of legitimate URLs are includedin a primary listing of legitimate URLs that represent legitimateentities subscribed to receive redirected browser extensions.

Determining whether a counterfeit URL is counterfeiting a legitimate URLmay include comparing the identified characteristics of the counterfeitURL and the identified characteristics of the legitimate URL. Forexample, the counterfeit URL may include an image of a logo that issubstantially similar to a logo associated with a legitimate entity andshown on a legitimate webpage. As another example, the counterfeit URLmay include listing of items (e.g., shoes for sale) displayed on thewebpage. The counterfeit webpage displaying listing of items may besimilar to a listing of items listed on a legitimate webpage, indicativethat the counterfeit webpage is attempting to counterfeit the legitimatewebpage.

In some embodiments, a first set of characteristics of the extractedcontent of a webpage associated with the received URL may be identified.The first set of characteristics may include at least one of detectedobjects, detected text, and detected source information included in theextracted content of the webpage associated with the received URL. Thefirst set of characteristics of the received URL may be compared to aplurality of characteristics associated with the first legitimate URL todetermine whether a number of common characteristics between the firstset of characteristics and the plurality of characteristics exceeds athreshold number, indicative that the first legitimate URL is within thethreshold similarity to the received URL. A threshold similarity mayinclude a specific or predetermined number of common characteristicsbetween the received URL and a legitimate URL that indicate that thereceived URL has attempted to counterfeit the legitimate URL.

Upon identifying a known legitimate URL that the counterfeit URL isattempting to counterfeit; the browser application may redirect therequest to the legitimate URL and record user behavior (block 618). Insome embodiments, the browser application may identify a legitimatewebpage with content that is within a threshold similarity to thecontent of a counterfeit webpage, and the user may be redirected to thelegitimate webpage. In some embodiments, the user may be redirected to alegitimate webpage listed on the primary listing of legitimate webpages.

If the browser application determines that counterfeit URL is notcounterfeiting a known legitimate URL, the user may be redirected to alegitimate URL included on a bidding space for competing entities (block620). For example, the browser application may be unable to match thecontent of a counterfeit URL with a known legitimate entity with athreshold similarity. In this example, a second set of legitimateentities included part of the competing entities may bid for or presentoffers to have the user redirected to their URL.

In some embodiments, the user may be redirected to a legitimate URLincluded in the competing space based on the extracted content of thecounterfeit URL and the bidding information associated with thelegitimate URL. For example, the counterfeit URL may include contentrelating to marketing shoes. In this example, various legitimateentities included in a competing space that operate webpages marketingshoes may present bids to have the user browser redirected to the URL ofa legitimate entity. The browser application may identify a primaryobjective of the counterfeit URL (e.g., marking a specific style ofshoe) and identify potential legitimate entities to have the userredirect to a legitimate entity URL that presented the greatest bid. Inthis example, the legitimate entity in the competing space that includesthe greatest bid value may have the browser application of the userdevice redirected to the URL of the legitimate entity.

In some embodiments, the listing of known legitimate URLs can be listedon a primary listing of legitimate URLs. If the counterfeit URL is notcounterfeiting any legitimate URL, the browser application can inspect asecondary listing of legitimate URLs to determine if the content of anylegitimate URL listed in the secondary listing of legitimate URLsmatches the counterfeit URL with a threshold similarity. The secondarylisting of legitimate URLs may include legitimate entities that areauthorized but have not subscribed as a known legitimate URL. In someembodiments, the legitimate URLs listed on the secondary listing oflegitimate URLs include URLs included in the competing space that maybid for or provide an offer to have the browser application of a userredirected to a legitimate URL.

In some embodiments, it may be determined that all legitimate URLsincluded in a portion of the plurality of legitimate URLs listed in aprimary listing of legitimate URLs do not exceed the thresholdsimilarity to the extracted content of the received URL. In response,the content of the received URL may be compared with content associatedwith a portion of the plurality of legitimate URLs listed in a secondarylisting of legitimate URLs representing legitimate entities offering abid to receive redirected browser extensions. A first legitimate URLlisted in the secondary listing of legitimate URLs may be identifiedthat includes content that is similar to the content of the received URLthat exceeds a threshold similarity. The received URL may be directed tothe first legitimate URL based on any of the content of the firstlegitimate URL being similar to the received URL exceeding a thresholdsimilarity and a bid/offer associated with the first legitimate URLexceeding any other bid/offer associated with other legitimate URLslisted in the secondary listing of legitimate URLs.

In some embodiments, a primary characteristic of the extracted contentof the received URL may be identified. Each legitimate URL included inthe secondary listing of legitimate URLs that include characteristicsmatching the primary characteristic may also be identified. redirectingthe browser extension to the first legitimate URL may be based ondetermining that the bid associated with the first legitimate URL isgreater than any bids of the other legitimate URLs included in thesecondary listing of legitimate URLs that include characteristicsmatching the primary characteristic. In an embodiment, the secondarylisting of legitimate URLs may include a redirect page that includes anadvertising space that can be bid for by legitimate entities (orcompeting peer brands) listed in the secondary listing of legitimateURLs.

The browse application may generate one or more analytics based on therecorded information relating to a user device (block 622). As shown inFIG. 6, when the browser application is either allowed access to thereceived URL or redirected to a legitimate URL, user behavior istracked/recorded. Based on the recorded user behavior, analytics may bederived, which is discussed in greater detail with respect to FIG. 7.

FIG. 7 illustrates a block diagram of analytics that may be generated,according to one embodiment. A browser application executing on a userdevice may record user behavior and interactions between the user deviceand the browser application. Examples of user behavior and interactionsthat are tracked may include URL requests received by each user, URLrequests that were allowed or blocked, URL requests that were redirectedto a known legitimate URLs, URL requests that were redirected to alegitimate URL, etc.

The browse application may generate one or more analytics based on therecorded information relating to a user device. An analytic may includea number of URLs selected in a time period (block 702). The number ofURLs selected in a time period may include a number of requests toaccess a specified URL on a web browser executing on a browserapplication. This may represent the number of overall requests for a URLduring a time period (e.g., a day, month, year).

An analytic may include a number of counterfeited URLs redirected in atime period (block 704). The number of counterfeited URLs redirected ina time period may include a number of instances that a requested URL wasdetermined to be counterfeit and the browser application was redirectedto a legitimate URL. In some embodiments, the requested URL may bedetermined to be counterfeit by one of matching a received URL with aknown counterfeit URL and receiving an assessment that the received URLis counterfeit from a CDS.

An analytic may include details for each counterfeited URLs redirected(block 706). This may include the received URLs deemed to be counterfeitand information relating to these URLs (e.g., the type of URL, the typeof content included in the webpage of each URL, why the URL was deemedcounterfeit). The details of each counterfeit URL may provide insight asto common mistakes the user is making in requesting a URL.

An analytic may include URL and source information for each counterfeitwebsite detected and redirected (block 708). Examples of URL and sourceinformation may include a website type, content included in the websitesand URLs, HTTP information in the websites, etc.

An analytic may include a number of legitimate URLs that werecounterfeited that were included in either of a primary listing oflegitimate URLs and a secondary listing of legitimate URLs (block 710).As noted above, a primary listing of legitimate URLs may include a listof legitimate entities that are subscribed to receive a redirected URLfrom a browser application of a user device. A secondary listing oflegitimate URLs may include a second listing of legitimate entities thatoffered a bid to have a redirected URL redirect to their legitimate URL.In some embodiments, the requested URL may be redirected to a legitimateURL on the secondary listing of legitimate URLs when no URL listed onthe primary listing of legitimate URLs matches the content of thereceived URL with a specific or threshold similarity.

In some embodiments, the analytics generated for the user device can beincluded in a set of analytics for a client. In this case, the analyticscan include trends and metrics for a plurality of users interacting withcounterfeit URLs that were redirected to legitimate URLs.

Based on the analytics derived from the user behavior and interactionswith a user device, a social engineering module 712 may be initiated fora user device. A social engineering module 712 may include a series ofinstructions and materials to assist a user interacting with a userdevice to detect and avoid a counterfeit URL and request access tolegitimate URLs. The social engineering module 712 may be displayed onan extension of the browser extension. In some embodiments, the socialengineering module 712 includes a series of best practices that aresuperimposed over a webpage displayed on the web browser, where the userinteracting with the web browser can interact with the best practicesincluded in the social engineering module. In some embodiments, theprogress through the social engineering module 712 may be recorded.

In some embodiments, the social engineering training module 712 may bespecific to a user device. For example, the social engineering trainingprocess may include a set of best practices to avoid a specific type ofcounterfeit URLs based on analytic(s) indicating that the user hasrepeatedly requested a counterfeit URL of that specific type. The socialengineering training process may include multiple sets of best practicesor instructions based on various analytics and trends associated with auser.

In some embodiments, the social engineering training module 712 may beinitiated based on a triggering event identified in the analytics. If ananalytic exceeds a threshold, a corresponding set of instructions forthe social engineering training process may be displayed on the browser.As an example, if a user has requested a number of URLs that have beenredirected to a legitimate URL that exceeds a threshold number, thebrowser application of a user may be extended to include a socialengineering training process that includes a set of instructions foridentifying counterfeit URLs.

In some embodiments, it may be determined from recorded interactionswith a user device that a number of instances that the browser extensionexecuting on the user device was redirected from a received URL to alegitimate URL exceeds a threshold number. Based on the number exceedingthe threshold number, the browser extension may be extended to display asocial engineering module providing instructions to identify legitimateURLs and avoid counterfeit URLs.

Based on the analytics derived from the user behavior and interactionswith a user device, a training module 714 may be initiated on a browserapplication of a user device. A training module 714 may include a seriesof instructions or activities to train the user to identify legitimateURLs and avoid counterfeit URLs. In some embodiments, the trainingmodule may include a series of activities and media (e.g., videos,images) providing examples and interactive instructions on avoidingcounterfeit URLs. A browser extension on a user device may extend toinclude the training module 714. In some embodiments, the trainingmodule may track the progress of the user through the series ofactivities associated with the training module. The training module mayadminister training on phishing prevention best practices to users whoclick on such phishing links, and track training progress.

In some embodiments, the training module 714 may be specific to a userdevice. For example, the training module may include multiple sets ofactivities to be completed based on the analytics derived from the userbehavior. For example, a first set of activities in the training modulecan relate to avoiding counterfeit URLs and a second set of activitiesin the training module can relate to best practices for providelegitimate URL requests.

In some embodiments, the training module 714 may be initiated based on atriggering event identified in the analytics. If an analytic exceeds athreshold, a corresponding training module may be displayed on thebrowser. As an example, if a user has requested a number of counterfeitURLs that exceeds a threshold number, the browser application of a usermay be extended to include a training module that includes a set ofactivities for avoiding counterfeit URLs.

Example Computing Device

FIG. 8 is a block diagram illustrating an example of a processing system800 in which at least some operations described herein can beimplemented. For example, one or more of the user device 110 orcounterfeit URL detection system 130 may be implemented as the exampleprocessing system 800. The processing system 800 may include one or morecentral processing units (“processors”) 802, main memory 806,non-volatile memory 810, network adapter 812 (e.g., network interfaces),video display 818, input/output devices 820, control device 822 (e.g.,keyboard and pointing devices), drive unit 824 including a storagemedium 826, and signal generation device 630 that are communicativelyconnected to a bus 816. The bus 816 is illustrated as an abstractionthat represents any one or more separate physical buses, point to pointconnections, or both connected by appropriate bridges, adapters, orcontrollers. The bus 816, therefore, can include, for example, a systembus, a Peripheral Component Interconnect (PCI) bus or PCI-Express bus, aHyperTransport or industry standard architecture (ISA) bus, a smallcomputer system interface (SCSI) bus, a universal serial bus (USB), IIC(I2C) bus, or an Institute of Electrical and Electronics Engineers(IEEE) standard 694 bus, also called “Firewire.”

In various embodiments, the processing system 800 operates as part of auser device, although the processing system 800 may also be connected(e.g., wired or wirelessly) to the user device. In a networkeddeployment, the processing system 800 may operate in the capacity of aserver or a client machine in a client-server network environment, or asa peer machine in a peer-to-peer (or distributed) network environment.

The processing system 800 may be a server computer, a client computer, apersonal computer, a tablet, a laptop computer, a personal digitalassistant (PDA), a cellular phone, a processor, a web appliance, anetwork router, switch or bridge, a console, a hand-held console, agaming device, a music player, network-connected (“smart”) televisions,television-connected devices, or any portable device or machine capableof executing a set of instructions (sequential or otherwise) thatspecify actions to be taken by the processing system 800.

While the main memory 806, non-volatile memory 810, and storage medium826 (also called a “machine-readable medium) are shown to be a singlemedium, the term “machine-readable medium” and “storage medium” shouldbe taken to include a single medium or multiple media (e.g., acentralized or distributed database, and/or associated caches andservers) that store one or more sets of instructions 828. The term“machine-readable medium” and “storage medium” shall also be taken toinclude any medium that is capable of storing, encoding, or carrying aset of instructions for execution by the computing system and that causethe computing system to perform any one or more of the methodologies ofthe presently disclosed embodiments.

In general, the routines executed to implement the embodiments of thedisclosure, may be implemented as part of an operating system or aspecific application, component, program, object, module or sequence ofinstructions referred to as “computer programs.” The computer programstypically comprise one or more instructions (e.g., instructions 804,808, 828) set at various times in various memory and storage devices ina computer, and that, when read and executed by one or more processingunits or processors 802, cause the processing system 800 to performoperations to execute elements involving the various aspects of thedisclosure.

Moreover, while embodiments have been described in the context of fullyfunctioning computers and computer systems, those skilled in the artwill appreciate that the various embodiments are capable of beingdistributed as a program product in a variety of forms, and that thedisclosure applies equally regardless of the particular type of machineor computer-readable media used to actually effect the distribution. Forexample, the technology described herein could be implemented usingvirtual machines or cloud computing services.

Further examples of machine-readable storage media, machine-readablemedia, or computer-readable (storage) media include, but are not limitedto, recordable type media such as volatile and non-volatile memorydevices 810, floppy and other removable disks, hard disk drives, opticaldisks (e.g., Compact Disk Read-Only Memory (CD ROMS), Digital VersatileDisks (DVDs)), and transmission type media, such as digital and analogcommunication links.

The network adapter 812 enables the processing system 800 to mediatedata in a network 814 with an entity that is external to the processingsystem 800 through any known and/or convenient communications protocolsupported by the processing system 800 and the external entity. Thenetwork adapter 812 can include one or more of a network adaptor card, awireless network interface card, a router, an access point, a wirelessrouter, a switch, a multilayer switch, a protocol converter, a gateway,a bridge, bridge router, a hub, a digital media receiver, and/or arepeater.

The network adapter 812 can include a firewall which can, in someembodiments, govern and/or manage permission to access/proxy data in acomputer network, and track varying levels of trust between differentmachines and/or applications. The firewall can be any number of moduleshaving any combination of hardware and/or software components able toenforce a predetermined set of access rights between a particular set ofmachines and applications, machines and machines, and/or applicationsand applications, for example, to regulate the flow of traffic andresource sharing between these varying entities. The firewall mayadditionally manage and/or have access to an access control list whichdetails permissions including for example, the access and operationrights of an object by an individual, a machine, and/or an application,and the circumstances under which the permission rights stand.

As indicated above, the techniques introduced here implemented by, forexample, programmable circuitry (e.g., one or more microprocessors),programmed with software and/or firmware, entirely in special-purposehardwired (i.e., non-programmable) circuitry, or in a combination orsuch forms. Special-purpose circuitry can be in the form of, forexample, one or more application-specific integrated circuits (ASICs),programmable logic devices (PLDs), field-programmable gate arrays(FPGAs), etc.

From the foregoing, it will be appreciated that specific embodiments ofthe invention have been described herein for purposes of illustration,but that various modifications may be made without deviating from thescope of the invention. Accordingly, the invention is not limited exceptas by the appended claims.

1. A method comprising: receiving, at a browser extension operating in abrowser application, a uniform resource locator (URL) requested withinthe browser application; obtaining information indicating that thereceived URL is a counterfeit URL; comparing content extracted from awebsite associated with the received URL with content associated withwebsites associated with each of a plurality of legitimate URLs based onobtaining information indicating that the received URL is counterfeit;identifying a first legitimate URL included in the plurality oflegitimate URLs whose associated website includes content that exceeds athreshold similarity to the content of the website associated with thereceived URL; and redirecting the browser application to the firstlegitimate URL based on identifying the first legitimate URL withcontent exceeding the threshold similarity to the content of the websiteassociated with the received URL.
 2. The method of claim 1, furthercomprising: transmitting the extracted content from the websiteassociated with the received URL to a counterfeit URL detection systemconfigured to analyze the extracted content, wherein said obtaininginformation includes receiving an assessment from the counterfeit URLdetection system indicating that the received URL is counterfeit.
 3. Themethod of claim 1, further comprising: comparing the received URL with alisting of known counterfeit URLs, wherein said obtaining informationincludes matching the received URL with any counterfeit URL listed inthe listing of known counterfeit URLs, indicating that the received URLis counterfeit.
 4. The method of claim 1, further comprising: responsiveto obtaining information indicating that the received URL iscounterfeit, blocking the browser application from accessing contentassociated with the received URL.
 5. The method of claim 1, wherein aportion of the plurality of legitimate URLs are included in a primarylisting of legitimate URLs that represent legitimate entities that aresubscribed to receive redirected URL requests, wherein the firstlegitimate URL is included in the primary listing of legitimate URLs. 6.The method of claim 1, further comprising: identifying a first set ofcharacteristics of the content extracted from the website associatedwith the received URL, the first set of characteristics including atleast one of detected objects, detected text, and detected sourceinformation included in the content extracted from the websiteassociated with the received URL; and identifying a plurality ofcharacteristics of the website associated with the first legitimate URL,wherein said comparing includes: comparing the first set ofcharacteristics with the plurality of characteristics to determine anumber of common characteristics between the first set ofcharacteristics and the plurality of characteristics; and determiningthat the number of common characteristics between the first set ofcharacteristics and the plurality of characteristics exceeds a thresholdnumber of common characteristics representing that the websiteassociated with first legitimate URL is within the threshold similarityto the website associated with the received URL.
 7. The method of claim1, further comprising: determining that all websites associated withlegitimate URLs included in a portion of the plurality of legitimateURLs listed in a primary listing of legitimate URLs do not exceed thethreshold similarity to the extracted content of the website associatedwith the received URL; comparing content extracted from the websiteassociated with the received URL with content associated with web sitesof a portion of the plurality of legitimate URLs listed in a secondarylisting of legitimate URLs, wherein the first legitimate URL is includedin the secondary listing of legitimate URLs.
 8. The method of claim 7,further comprising: identifying a primary characteristic of theextracted content of the website associated with the received URL;identifying each legitimate URL included in the secondary listing oflegitimate URLs whose associated website includes characteristicsmatching the primary characteristic, wherein said redirecting thebrowser application to the first legitimate URL is based on determiningthat the bid associated with the first legitimate URL is greater thanany bids of the other legitimate URLs included in the secondary listingof legitimate URLs whose associated websites include characteristicsmatching the primary characteristic.
 9. The method of claim 1, furthercomprising: recording, by the browser extension, user interactions withone of the received URL and the first legitimate URL; and generatinganalytics that quantify the user interactions with one of the receivedURL and the first legitimate URL.
 10. The method of claim 9, wherein theanalytics include at least one of: a number of URL requests received, anumber of instances that the browser application was redirected from anyreceived URL to a legitimate URL, a number of instances in which thebrowser application was redirected to any legitimate URL listed on aprimary listing of legitimate URLs, and a number of instances in whichthe browser application was redirected to any legitimate URL listed on asecondary listing of legitimate URLs.
 11. The method of claim 1, whereinthe browser extension operating in the browser application is configuredto execute on a smartphone.
 12. A non-transitory computer-readablestorage medium storing a browser extension that comprises computerprogram instructions, the computer program instructions when executed bya processor causing the processor to: receive a uniform resource locator(URL); obtain information indicating that the received URL is acounterfeit URL; extract content from a webpage associated with thereceived URL and from webpages associated with each of a plurality oflegitimate URLs associated with legitimate entities; compare theextracted content from the webpage associated with the received URL andthe extracted content from the webpages associated with each of theplurality of legitimate URLs to identify a first legitimate URL includedin the plurality of legitimate URLs whose associated webpage has asimilarity to the content of the webpage associated with the receivedURL that exceeds a threshold similarity; and redirect a browserapplication to the first legitimate URL based on identifying the firstlegitimate URL.
 13. The non-transitory computer-readable storage mediumof claim 12, wherein the computer program instructions, when executed bythe processor, further cause the processor to: compare the received URLwith a listing of known counterfeit URLs; and transmit the extractedcontent of the webpage associated with the received URL to a counterfeitURL detection system configured to analyze the extracted content basedon failing to match the received URL with any known counterfeit URLlisted on the listing of known counterfeit URLs, wherein said obtaininginformation includes receiving an assessment from the counterfeit URLdetection system indicating that the received URL is counterfeit. 14.The non-transitory computer-readable storage medium of claim 12, whereinthe computer program instructions, when executed by the processor,further cause the processor to: identify no legitimate URL, included ina listing of known legitimate URLs that have subscribed to receiveredirected browser extensions, that has an associated webpage having asimilarity to the extracted content of the webpage associated with thereceived URL that exceeds the threshold similarity; and inspectextracted content from webpages associated with legitimate URLs includedin a secondary listing of legitimate URLs, wherein the first legitimateURL is included in the secondary listing of legitimate URLs.
 15. Thenon-transitory computer-readable storage medium of claim 12, wherein thecomputer program instructions, when executed by the processor, furthercause the processor to: record, by the browser extension, user behaviorsand interactions with the received URL and the first legitimate URL;generate analytics that quantify the user behaviors and interactionswith the received URL and the first legitimate URL; and display theanalytics on a webpage.
 16. A method comprising: receiving a request toaccess a webpage associated with a requested uniform resource locator(URL) at a browser extension operating in a browser application on auser device; applying a first model to content extracted from thewebpage, the first model trained to output an assessment indicating thatthe requested URL is counterfeit; determining that content extractedfrom a webpage associated with a legitimate URL has a similarity tocontent extracted from the webpage associated with the requested URLthat exceeds a threshold similarity; redirecting the browser applicationfrom the requested URL to the legitimate URL based on the exceededthreshold similarity; recording interactions between the user device andthe requested URL and the legitimate URL; and generating one or moreanalytics based on the recorded interactions.
 17. The method of claim16, wherein the analytics include at least one of: a number of URLrequests received, a number of instances that the browser applicationwas redirected from any requested URL to a legitimate URL, a number ofinstances in which the browser application was redirected to anylegitimate URL listed on a primary listing of legitimate URLs, and anumber of instances in which the browser application was redirected toany legitimate URL listed on a secondary listing of legitimate URLs. 18.The method of claim 16, wherein the extracted content of any of thewebpage associated with the requested URL and the webpage associatedwith the legitimate URL comprises at least one of: an object detected inan image extracted from the webpage, text extracted from the webpage, ahypertext transfer protocol (HTTP) request header or body, and an HTTPresponse header or body.
 19. The method of claim 16, further comprising:determining, from the recorded interactions, that a number of instancesin which the browser application was redirected from a requested URL toa legitimate URL exceeds a threshold number; and displaying a socialengineering module on the user device, the social engineering moduleproviding instructions to identify legitimate URLs and avoid counterfeitURLs.
 20. The method of claim 16, further comprising: determining, fromthe recorded interactions, that a number of instances in which thebrowser application was redirected from a requested URL to a legitimateURL exceeds a threshold number; and implementing a training module onthe user device, the training module providing a series of instructionsto train a user to identify legitimate URLs and avoid counterfeit URLs,wherein the training module tracks a progression of the user through theseries of instructions.